Meaningless comparisons lead to false optimism in medical machine learning
نویسندگان
چکیده
A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algorithm works, scientists typically ask how well its output correlates with medically assigned scores. Here we perform a meta-analysis to quantify how the literature evaluates their algorithms for monitoring mental wellbeing. We find that the bulk of the literature (∼77%) uses meaningless comparisons that ignore patient baseline state. For example, having an algorithm that uses phone data to diagnose mood disorders would be useful. However, it is possible to explain over 80% of the variance of some mood measures in the population by simply guessing that each patient has their own average mood-the patient-specific baseline. Thus, an algorithm that just predicts that our mood is like it usually is can explain the majority of variance, but is, obviously, entirely useless. Comparing to the wrong (population) baseline has a massive effect on the perceived quality of algorithms and produces baseless optimism in the field. To solve this problem we propose "user lift" that reduces these systematic errors in the evaluation of personalized medical monitoring.
منابع مشابه
Optimism in Active Learning
Active learning is the problem of interactively constructing the training set used in classification in order to reduce its size. It would ideally successively add the instance-label pair that decreases the classification error most. However, the effect of the addition of a pair is not known in advance. It can still be estimated with the pairs already in the training set. The online minimizatio...
متن کاملOptimism in Active Learning with Gaussian Processes
In the context of Active Learning for classification, the classification error depends on the joint distribution of samples and their labels which is initially unknown. The minimization of this error requires estimating this distribution. Online estimation of this distribution involves a trade-off between exploration and exploitation. This is a common problem in machine learning for which multi...
متن کاملLearning-Based Energy Management System for Scheduling of Appliances inside Smart Homes
Improper designs of the demand response programs can lead to numerous problems such as customer dissatisfaction and lower participation in these programs. In this paper, a home energy management system is designed which schedules appliances of smart homes based on the user’s specific behavior to address these issues. Two types of demand response programs are proposed for each house which are sh...
متن کاملDetection of Glioblastoma Multiforme Tumor in Magnetic Resonance Spectroscopy Based on Support Vector Machine
Introduction: The brain tumor is an abnormal growth of tissue in the brain, which is one of the most important challenges in neurology. Brain tumors have different types. Some brain tumors are benign and some brain tumors are cancerous and malignant. Glioblastoma Multiforme (GBM) is the most common and deadliest malignant brain tumor in adults. The average survival rate for peo...
متن کاملOver-optimism in bioinformatics research
The problem of ”false research findings” in medical research has focused much attention in the last few years (Ioannidis, 2005). One of the main problems, termed as ”fishing for significance” in the present letter, is that researchers often (consciously or subconsciously) report results that are in fact the product of an intensive optimization, i.e. of multiple comparisons. Such results are typ...
متن کامل